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We consider a general class of multibit adaptive quantizers in which 
the quantizer function is modified at every sampling instant according to 
a recursive law with the transitions depending on the value of the quantizer 
output. We obtain a rather comprehensive set of basic properties of the 
device which explain the interrelationship of different aspects of the device 
behavior and their dependence on the parameters of the adaptation algo- 
rithm. For the quantitative analysis of the device, we give formulas and 
bounds for the mean time required for the quantizer function to adapt from 
an arbitrary initial state to the optimal. A feature new with this work is a 
unified treatment and a common body of results for quantizers with both 
bounded and unbounded range. This paper extends all the analytical 
results reported in an earlier paper, which dealt with a restricted class of 
quantizers having only four levels. 

We also present new results from a computational investigation on 
quantizers up to four bits {sixteen levels). These results indicate, for well- 
designed examples of the respective classes, the kinds of improvement in 
performance that can be expected in going from three-bit (eight-level) to 
four-bit quantizers and from uniform to nonuniform quantizers. 

I. INTRODUCTION 

In a recent paper 1 we obtained a number of fundamental properties 
of a class of two-bit (four-level) adaptive quantizers useful for coding 
speech and other continuous signals with a large dynamic range. We 
also developed formulas for the quantitative analysis of the device. 
In the present paper, we consider a general, multibit adaptive quantizer 
and obtain extensions to all the results previously reported. A feature 
new with this work is a unified treatment and a common body of 
results for quantizers with both bounded and unbounded range, the 
former being the case of practical interest. 

In the final section of the paper, Section IV, we present results from 
a computational investigation on adaptive quantizers up to four bits. 
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Readers familiar with quantizers and whose primary interest is in the 
performance of the device may skip the earlier sections that contain 
the development of the mathematical results. Section IV includes a 
comparison of the performances of uniform and nonuniform quantizers 
for normally distributed input sequences. 

A quantizer with 2N levels is shown in Fig. 1. In the figure, input 
refers to the nth sample of the continuous signal, x(n), where n = 0, 
1, • • • ; output refers to the level that is coded before transmission at 
that time. We let £i = 1 and call A the step size.* In uniform quan- 
tizers, £,• = i and the vertical axis is also subdivided into equal intervals 
in the range (771A, tjjvA). In adaptive quantizers which are of interest 
here, the step size, and hence the entire quantizer function, is time- 
variable, and the step size at the nth sampling instant is denoted by 
A(n). The parameters {&] and {??,} are predetermined and do not 
change with time. 

In this paper, the main algorithm for step-size adaptation is 

A(n + 1) = MMn) if £,_iA(n) ^ \x{n)\ < £<A(n), (1) 

where Mi, M 2 , ■ • • , Mn, called multipliers, are fixed constants. The 
following natural restrictions are imposed on the multipliers : 

Mi < 1 < M x and M i £ M 2 £ • • • £ M N . (2) 

Even so, a great deal of the flexibility of the quantizer is incorporated 
in the multipliers and, to some extent, in the parameters {£,•} and 
{m}. Observe that the algorithm in (1) utilizes only unit memory and 
that it is not necessary to transmit to the receiver separate information 
on the step size. 

We shall also be considering the following important variation of 
(1) in which the step sizes (A(n)} are constrained to be within a 
specific bounded interval [R,L~\; suppose £,_i A (n) ^ \x(n)\ < £,A(n), 
then 

A (to + 1) = MMn) if R ^ MMn) ^ L 

= R if MMn) ^ R 

= L if I S MMn). (3) 

We call the associated device the saturating adaptive quantizer. There 
are situations where it is attractive to have the interval [R, L~\ rela- 
tively small. 

The most restrictive assumption that is made about the input 
sequence { x (n) } is that it is a sequence of independent random vari- 
ables (see Sections 1.1 and 1.2 for a discussion). However, in differ- 
ential pcm schemes in which the quantizer is used together with a 



* For notat ional convenience, we also let £ o = and %n = <*> . 
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Fig. 1 — The quantizer function. 



predicting filter in the feedback loop, the effect of the restriction is 
diminished. 

With £,- = i, the adaptation algorithm in (1) is due to Cummiskey, 
Flanagan, and Jayant, 2 - 3 who have also implemented speech coding by 
a four-bit quantizer. References 1, 2, and 4 may be consulted for a 
fuller account of the antecedents of the quantizer and related work 
that has been done in this area. Goodman and Gersho 4 have also 
examined the general multibit quantizer from a theoretical point of 
view, and their work complements rather well the work described here. 

We briefly summarize here the main features of this paper. 

(i) The theory that we give here applies to quantizers having 
bounded range and finite alphabet, with the important properties and 
relations holding also for quantizers with unbounded range. However, 
as may be expected, differences do exist between the two types of 
quantizers. For instance, a key relation in the work of Goodman and 
Gersho, 4 who do not consider finite range quantizers, called the design 
equation, holds exclusively for the class they consider. 

(ii) The single most important property of either type of quantizer — 
ordinary or saturating — that we find is a localization property which 
states that, for independent identically distributed inputs, there exists 
a strong localization of the mass of the stationary step-size distribution 
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about an easily identifiable central value. See Theorem 1, Section 2.2, 
for a statement of this property. The localization property, together 
with certain scaling properties of the central state, provides the key 
to the synthesis of the adaptive quantizers. 

(tit) A property of the quantizers having important implications 
is that, under certain conditions, as the range of the multipliers is 
decreased to approach unity, then the stationary step-size distribution 
becomes increasingly concentrated about the central step size. A 
result of this type is given in Ref. 4, where it is shown that a "spread 
function" has the appropriate behavior. However, the definition of the 
spread function is novel, and connections, if any, with the dispersion 
of mass in the distribution are not established. In Section 2.4 we 
establish the property directly in terms of the mass of the distribution. 

(iv) In Section III we develop, as design aids, formulas and bounds 
on the mean adaptation time, i.e., mean time required for the step 
size to adapt from arbitrary initial values to the central step size. 

The mathematical analysis is of a random walk on the integers, in 
which the state transition probabilities depend on the states. Random 
walks of the type considered here are encountered in other areas ; for 
instance, in various schemes (up-and-down method, transformed up- 
and-down method 8-7 ) for estimating a quantile of an unknown dis- 
tribution by using only response, nonresponse data, as is required in 
bioassay, sensitivity data analysis, and psychological testing. The 
central properties of the random walk that we obtain here are new 
and of general interest. 

1.1 Assumptions and background 

Let o > denote a scale parameter and let 9 denote an equivalence 
class of distributions F,(z), z ^ 0, in which the distributions are 
identical to within a scaling operation, i.e., 

F.(cz) = Fi(z). (4) 

For instance, 9 may be the class of half normal distributions, in which 
case a 2 is the variance and F x (z) = Pr [|x| ^ z], where x is normal 
with zero mean and unit variance. In what follows we let {x a (n)} 
denote a sequence of independent random variables, each with the 
distribution function Pr [|x»(n) | ^ z] = F„(z). 

We recall certain known facts about optimal nonadaptive quantiza- 
tion where {x a {n)\ forms the input sequence, F a (z) is known, and, for 
some suitable choice of a fidelity criterion such as E[_{y (n) — x„(n) }■] 
where {y(n)} is the output of the quantizer, the optimal step size A, 
is computed. With the rms criterion and the inputs normally distrib- 
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uted, Max 8 has computed A„ and, for the nonuniform case, the 
corresponding optimal parameters {£,}, [fa] for quantizers with 
various levels, N. A convenient way of presenting such results for 
any 9 is as 

^i(Ai) = a, (5) 

where a is some constant, since optimal (nonadaptive) step sizes A a 
corresponding to the scale parameter a are obtained from 

A, = ctAl (6) 

In this paper we show that, when a is fixed and [x,{n)\ forms the 
input to the quantizer, then the step size, a random variable evolving 
according to either (1) or (3), has a natural center C a . We show, for 
instance, that the stationary step-size distribution is localized about 
C a and that the degree of localization may be arbitrarily increased, 
although at the cost of other aspects of performance. There are two 
important facts to note about C a . First, by virtue of its explicit defini- 
tion, Ci can be made to take almost any desired value by suitable 
choice of the multipliers. Second, as we show in the following section, 
the central step size has a scaling property similar to (6). We are 
therefore in a position to incorporate the results of optimal nonadaptive 
quantization by identifying A x with C\. 

1.2 Central state 

We consider only quantizers with multipliers having the following 
form: 

Mi = 7"".- » = 1,2, ••-,#, (7) 

where 7 is some real number greater than 1 and the m/s take integral 
values. With (2), this implies 

w?i < < ttin and m\ ^ m 2 ^ • • • ^ m N . (2') 

We shall further take the set of m.-'s to be relatively prime, i.e., then- 
greatest common divisor is 1. If, as we shall assume, the initial step 
size is of the form 7*, i integral, then the step size is always of that 
form and the space of possible step sizes forms a lattice. 

Consider an independent identically distributed input sequence 
[xi(n)\, where Pr{|x x (n)| ^ z) = F x (z) and Fi(-) is an element of 
g. We drop the subscript that identifies the scaling. For z ^ 0, let* 

B(z) ^ £ mAFUrz) - F(£ r _ lZ )}. (8) 



* F(0) = 0, F(z) — > 1 as z — > w and F(z) is monotonic, strictly increasing with z. 
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Since it is also true that 

B(z) - m* - £ (wir+i - m r )F(M, (8') 

r=-l 

it is clear that B(z) is a mono-tonic, strictly decreasing function of 2; 
also, B(0) m m N > and B(z) ->mi < as z -><*>. Hence, there 
exists a unique integer i with the property that 

5(7 i_1 ) > ^ B(y*). (9) 

We denote y* by C and refer to it as the central step size. All step sizes 
are considered to be of the form Cy*, i = 0, ±1, ±2, 

Remarks : 

(i) The parameters {m,-} and 7 may be selected to make the resulting 
central step size C approximate as closely as desired any given real 
positive number, A. First, by making 7 close to unity the grid of 
possible step sizes can be made sufficiently fine. Second, the integral 
parameters {ra,| can be chosen to make £ m r {F(Z r A) — F(£ r _iA)} 
sufficiently small. 

(ii) So far, we have been concerned with the central step size for 
the probability distribution F{z), corresponding to the particular scale 
parameter a — 1. To demonstrate the behavior of the central step 
size with various scale parameters, let C„ denote the central step size 
corresponding to the input probability distribution, F a (z), and let 
B a (z) be defined like B(z) in (8) with F(-) replaced by F„(-)- Let C, 
be the unique solution of 

B,{Q a ) = 0, (10) 

where, of course, C a may not be of the form 7*, i integral. However, 

CJy <C a ^ C a . (11) 

We observe that C a scales, i.e., 

C„ = aCi. (12) 

The above follows from the following property of the functions 

{**(•)}: 

B a {az) = £i(z). 

From (11) and (12), 



CJy < *Qi £ CA , (13) 



and it is in this sense that we say that the central step size scales. 
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1.3 Basic equations 

We define a Markov chain and obtain the transition equations for 
the ordinary quantizer with the inputs being {x(n)}, which are inde- 
pendent identically distributed, and Pr{ \x{n)\ g z) = F(z). Let 

w(n) = log 7 A(n) - log T C, 
so that 

o»(n + 1) = co(w) + m r if Zr-iCy uM ^ \x(n)\ < £ r CV<»>, (14) 

where 1 ^ r ^ N. We have in (14) a Markov chain on 0, ±1, ±2, • • • , 
with the central step size C corresponding to the state. Let 

p(i; n) = Pr [w(n) = %}. 

The state transition equations are 

N 

p(i;n + l) = £ &«(* - m r )p(i - m r ; n), (15) 

where the transition probabilities are 

b"(i) i F(f r C 7 - F(|r-iC 7 '), 1 ^ r ^ N. (16) 

The qualitative results that we obtain are based on the following 
two relations that do not depend on the particular distribution F(z). 

(i) =g F(l; T y<) < F(^y^) ^ 1 

for all i and 1 ^ r ^ (iV - 1). (17) 

(m) L m r 6(')(-l) > ^ £ m r &<'>(0). (18) 

r-l r-1 

The latter condition follows from the definition of the central step size. 
The state of the random walk has the following important prop- 
erty : There is a net drift to the left (right) from states to the right 
(left) of the state. 



E[w(n+ l)\u>(n) = i~] - 


N 

- i = £ m T b (T) {i) < if i > 




> if i < 



• (19) 

The above super- and submartingale properties are the basis for the 
existence of a stochastic Liapunov function (Appendix A) and the 
bound given in Section 3.2. 
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1.4 Saturating adaptive quantizer 

Any hardware implementation of the quantizers will incorporate 
some scheme for restricting the range of step sizes. In addition, there 
are reasons for desiring the step size to be bounded. For instance, by 
limiting the step sizes at both ends, it is possible to devise automatic 
schemes for "forgetting" the effects of past channel errors. 9 In such 
algorithms, the step size may be bounded to fairly small intervals. 

For the saturating adaptive quantizer, eq. (3), suppose that 

k^CyOO ^ \ x (n)\ < £ r C 7 u(n) 

for some r, 1 ^ r ^ N. We obtain the following equation analogous 
to (14) : 

w(n + 1) = co(n) + m r if — K ^ w(w) + m r ^ L 
= -K if w(n) + m r ^-K 

= L if LS w(n) + m r , (20) 

where K and L are fixed positive integers. The ordinary quantizer is 
obtained if K, L —* °° . 

We observe the following: The central state for the saturating 
adaptive quantizer may be denned exactly as in the ordinary type of 
quantizer; the important martingale properties, expressed in eq. (19) 
for the ordinary quantizer, carry over to the saturating type. The 
time-dependent transition equations of the saturating quantizer are 
characterized by numerous involved boundary equations. However, 
the bulk of the equations are of the form given in (15) : 

N 

p(i; n + 1) = £ 6 (r) (i - m r )p(t - m T ;n) 

-K + m N ^ % £ L + mi. (15') 

We do not give the remaining equations since we have no direct need 
for the time-dependent equations. In Appendix B we give, following 
the method and notation of Section 2.1, a complete set of reduced 
equations satisfied by the stationary probabilities. 

II. STATIONARY DISTRIBUTIONS 

Appendix A establishes the existence and uniqueness of a finite 
stationary distribution for the step size in the quantizers. The following 
sections establish the main qualitative properties of the stationary 
distributions for both the ordinary and saturating adaptive quantizers. 

If we set p(i; n + 1) = p(i; n) = p(i) in the time-dependent 
equations, then the stationary probabilities are given by { p (*) } . Thus, 
the stationary probabilities of the ordinary adaptive quantizer are 
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obtained from 



p(i) = E & (r> (* - m r )p(i - m r ), i = 0, ±1, ±2, • • (21) 

r— 1 

and the normalization equation, 

f P(0 = 1. 



2.7 ifl i/sefu/ reduction of the equations for stationary probabilities 

In each equation in (21), the maximum difference in the indices of 
the state probabilities is (m N — mi). By exploiting a property of the 
stationary distribution, we now obtain a set of new equations where 
the maximum difference in the indices is (m N — nil — 1). The reduced 
set of equations together with the normalization equation is complete. 
A simple interpretation and the motivation of the reduced equation 
is given in Ref. 1 ; remark (ii) below gives an additional probabilistic 
interpretation. The reduced equations are important to us, as they 
allow us to consider only a smaller set of solutions. 

For any integral j, 



i i N 

L P(») = E E & Cr), (* - m r )p(i - m r ) 

— at i = — oo r = l 



]—mN 



N ) j-ms-i fJV-1 

E b"(i)\p(i) + I Z b"(i) 

r = l I i=y— ms+l I r = l 



p(i) 



+ ■■■+ If b™(i)p(i). 

i— j— mi+1 



Since E"=i 6 (r) (i) = 1, the above reduces to 

j N~\ j-mr [ T 

E p(i) = E E E & ( '>tt 

i=j— mt/ + l r = l i=y— mr+i + 1 [s = l 

Define for 1 ^ r ^ JV and all integral i, 



V T) (i) = E 6 (,) (*)- 



P(0- 



(22) 



(23) 



The quantities {^ <r) (i)} may be directly obtained from the input 
distribution, since ^ (r) (i) = F(^ r Cy i ). From (22) we obtain the reduced 
equations 



3 N-l j-m, 

E Pd) = E E * (r) (*)p(0 j = 0, ±1, ±2, 

l'™/— mw+l r = l t=;— ra„i+l 



(24) 
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In these equations, the set [ j - m r+1 + I, j - ra r ] is to be treated as 
empty if m r = m r +i- 

Remarks : 

(i) The manipulations leading to (24) are justified since they involve 
bounded quantities, as is implied by the existence of a unique finite 
stationary distribution. 

(it) Equation (24) is equivalent to the following identity, which is 
intuitively plausible and may be proven independently : 

Pr. O(n) ^ j and «(n + 1) ^ j + 1] 

= Pr 4 [«(n + 1) £ j and «(n) ^ j + 1], 

where the subscript s is being used to identify stationary probabilities. 
(Hi) Equation (24) may be used to give a simple proof of an identity 
(called simply an identity in Ref. 1 and "the design equation" in 
Ref. 4) involving the stationary state probabilities of the ordinary 
quantizer. Sum both sides of (24) for all integral j : 

oo j oo N — 1 i—^*r 

Z E p(0 = £ £ £ * (r) (»)p(*). 

j"=— oo i— j— mjv+l j"=— oo r— 1 i— j— m r +X+l 

The left-hand side is simply mn and the right-hand side is 



N 

•m N — X "Mr, 

r-l 



where 



Hence, 



q T = £ {* w (t)-* ( - l) (0}p(*). 



A' 



E ^r9r = 0. (25) 



r=l 



Equation (25) has a natural interpretation if we recognize that q r is 
the stationary rth step occupancy probability, i.e., 

q r m Pr. ttr-Mn) ^ |x(n)| < ^A(n)]. (26) 

The steps leading to eq. (24) may be repeated for the saturating 
adaptive quantizer, and a similar reduction may be achieved. These 
equations are given in Appendix B. The main recursion is identical 
to that of the ordinary quantizer, namely, eq. (24), and holds for all 
integral j, —K + ms ^ j ^ L + m x + 1. Observe that the range 
over which (24) is valid, for the saturating quantizer, is such that 
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every state probability is included in at least one component of the 
recursion. 

It may be verified by the reader that the identity in (25), the design 
equation of Ref. 4, does not hold for the saturating quantizer. 

2.2 Localization property of the stationary distribution 

We prove a fundamental distribution-free property of the stationary 
distribution of the step size. For both the ordinary and the saturating 
adaptive quantizers, we obtain sharp geometric bounds on almost all 
the stationary state probabilities as a function of the distance of the 
state from the state. The actual bounds obtained are somewhat 
stronger than the above statement implies, since the rate parameter in 
the geometric bound itself decreases monotonically with increasing 
distance from the state. These bounds show that a strong localization 
of the mass of the stationary distribution about the state (central 
step size) is inherent in the random walk. Also, we found that it was 
necessary to prove a result like Theorem 1 before the effects of the 
multipliers on the dispersion of the stationary distribution could be 
quantified. 

It is necessary to define certain vectors and matrices of dimensions 
(m N - mi - 1) and {m N - mi - 1) X (m N - m x - 1), respectively. 
Let P, denote the column vector with the following components:* 

P.' = &>(*), P(i '■ + 1), • • -, p{i + m N - mi - 2)]'. (27) 

Equation (24) may be used to construct matrices {A,}, which govern 
the transitions of the above vectors in the following manner : 

P<+i = A.P.. (28) 

By examining (24) we observe that the elements of A, depend on the 
quantities $ (r) (i), •■-, ^ r) {i + m N - mi - 1), 1 % r ^ N, and the 
subscript i indicates this dependence. 

Theorem 1 {Localization Property) : Let i > 0. For both the ordinary and 
saturating adaptive quantizers, there exists a constant weight vector with 
positive elements, X, and a constant, r > 1, depending only on A,- such 
that, for all j ^ i, 

(VP;) g^^'^p,). (29) 

There exists the Li-norm, |z| £ Z A*|x fc |, of the vectors {P,j which 
decreases geometrically as \ j — i \ increases. 



* The superscript / denotes the transpose. 
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An identical statement with \ j — i\ replacing the index j — i in 
{29) is also true for i < and all j ^ i. 

Remarks* : 

(i) When r and 3i in (29) are as constructed by us in the proof of 
the theorem, then the inequality in (29) becomes an equality if A k = A, 
for k = i, i + 1, • • •, j. This indicates that it is not possible to obtain 
tighter geometric bounds without making further assumptions on the 
distribution F(z). 

Using Theorem 1, we can give the following point- wise bound on 
the stationary state probabilities for both the ordinary and saturating 
adaptive quantizers : f let i > ; then, for j ^ i 



pu + m N - m, - 2) s (iyvpo = (- r y 



(30) 



(30') 



Similarly, f or i < and all j ^ i, 

p(j -m N + nn + 2) ^ (^y~VPi) ^ (i)*"' 

The proof of (30) is as follows. Let X m denote the largest element of the 
vector 3l occuring in Theorem 1 so that 1 ^ m ^ m N — m x — 1. From 
Theorem 1, 

X m pU + m - 1) * *<P; ^ (^"V'PO ^ (i^Xmd'PO, 

and the inequalities in (30) follow. 
Remarks : 

(ii) Observe that for the bounds in (29) and (30) we may use any 
i, < i ^ j, as the reference state. The choice of the best reference 
state depends on the behavior of r with i which, in turn, depends on 
the distribution F(z). The main distribution-free property of r(i), 
namely, statement (Hi) of Lemma 1, indicates an advantage of choos- 
ing a large i for the reference state. In Section 2.4, we prove an assertion 
by implicitly using more than one reference state i. 

The proof of Theorem 1 relies on two lemmas that we state here 
and prove in Appendix C* 



* This remark implies the tightness of the bound in (29), which is lacking for the 
bound obtained in Ref. 1 for the two-bit quantizer. 

+ The vector 1 has every element equal to imity. 

* Observe that neither A, nor Aj 1 is a nonnegative matrix so that the usual 
Frobenius theory does not apply. 
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Lemma 1 : For every i > 0, 

(t) A, is nonsingular and A, -1 has a unique positive real eigenvalue, 

say, r. Furthermore, r > 1. 
(ii) Every element of the corresponding left eigenvector of A -1 , 3i, is 
of the same sign and nonzero, hence we may take X to be a positive 
vector. 
{Hi) r which depends on i is monotonic, strictly increasing with i. 

Lemma 2 : For j ^ i > 0, 

VtAj-i - Ar^Pi+i ^ 0. (31) 

Remarks : 

{Hi) It is not the case that V[kf l - A" 1 ] ^ 0, so that (31) is 
not true if Py + i is taken to be an arbitrary nonnegative vector.* In 
proving Lemma 2 it is necessary to take into account the fact that the 
vector Py, from which Py + i evolves according to eq. (28), is itself 
nonnegative, and this implies that P J+1 is restricted to a cone that is 
a proper subset of the nonnegative quadrant. 

Proof of Theorem 1 : For j ^ i > 0, 

MP, = VAf x P*w = ^[A/- 1 - AT^Py+i + VAf'Py+i 
= at'CV - Ar x ]P,+i + r3t*P i+ i from Lemma 1 

^ r*'P, + i from Lemma 2. (32) 

Hence, (VPy) ^ (l/r)*-'(3t»P<) for all j ^ i, as was to be proved. 

As every element of Py is nonnegative, the Li-norm | Py | is equal 
to VPy. Finally, we may transfer the result that holds for i > to 
the case of i < by a simple renumbering of states in the manner that 
has been indicated in Ref. 1. 

The notation common with Ref. 1 conceals some rather significant 
differences in both the main result (29) and its proof. In Ref. 1, the 
corresponding result involved X and r, which were elements of the 
eigensystem of an additional matrix A, obtained in an involved way 
from A,. The result in Lemma 2 has no counterpart in Ref. 1. The 
geometric bound obtained in Ref. 1 is peculiar to two-bit {N = 2) 
quantizers, and does not directly generalize. Also, the bound obtained 
here is stronger even for the case N = 2. 

2.3 Lower bounds on the steepness factors, r(i) 

Theorem 1 and the subsequent bound in (30) indicates that r{i) 
is a local measure of the rate with which the stationary probabilities 

* A vector is nonnegative if every element is nonnegative. The nonnegative 
quadrant in R " is the set of all nonnegative vectors of dimension n. 
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change, and for this reason we find it natural to call r(i) the local 
steepness factor. Here we go back to the definition of r(i) as being the 
unique positive real root of the polynomial C(m), eq. (60), to obtain 
the following bound on r(i), which has the advantages of being ex- 
plicit and being dependent only on the transition probabilities at state 
i. We make free use of this bound in the following section. 



r(i) ^ p(t) = 



£ (-W!r){* <r) (*) -t™<$] 
r = l 



l/mjv— mi— 1 



, (33) 



where, of the N multipliers, only n multiphers have values not exceed- 
ing unity, i.e., 

mi, m 2 , • • •, ra M ^ 
and 

m M+ i, m M+2 , • ■ • , mjv > 0. 

The bound p(i) has certain interesting properties. First, observe that, 
by virtue of the definition of the central state [eqs. (8) and (9)], 
p\i) > 1 for all i > 0. Also, the sequence p(i), is, like \r(i) } , monotonic, 
increasing with i. The numerator and denominator of the bracketed 
expression have interesting probabilistic interpretations : The numer- 
ator (denominator) is the expected change in state conditional on the 
transition being from state i to all states i' ^ i(i' > i). 

The proof of eq. (33) is involved, and for the sake of brevity we 
omit giving it. 

2.4 Effect of 7 on the stationary distribution 

We show in this section that the mass of the stationary distribution 
of the step size can be concentrated about the central step size to an 
arbitrary extent by making y sufficiently close to unity. To show this, 
we first put together, from the results of the preceding two sections, a 
rather explicit bound on the stationary probability of the step size 
exceeding a particular value for a given 7, i.e., Pr a [A > Cy x ~\. This 
bound is in a form that allows direct comparison with the corresponding 
probability arising from the choice of y' = -yjy. By successively taking 
7 to be the square root of the preceding value, the bound on the 
probability can be made as small as desired. This procedure for proving 
the assertion is similar to the one we developed in Ref. 1. We restrict 
our attention to step sizes that exceed the central step size, i.e., i > 0, 
since a parallel argument holds for i < 0. 

In the following discussion the quantity (m N — mi — 2) arises 
frequently, and it is convenient to denote this quantity by the symbol 
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p. Clearly, f is a measure of the spread in the log of the multipliers. 
For i > and r = r(i), we have from eq. (29) that 



(SX 



i) £ VU) ^ t VPy ^ VP, ± ( IY = VP,^ 

/-,■+, /«.»• y=o \ r/ r — 1 



Now 



r ^ p{i), 
where p(i) is denned in eq. (33), and 



(34) 



(35) 



2Xi 



^ max [p(i), ••-,p(t + f)]. 



Since 



Pr 8 [A ^ C 7 "*] = £ p(j), 



we have, from eq. (34), 



P(») 



Pr. [A ^ C 7 ***] ^ p( ^I t max [p(z), • • •, p(i + „)] 



• (36) 



Finally, from Eq. (30), for i ^ v + 1, 



max 



CpW, • ••,?(*" + ">^[^y] " 



(37) 



Equations (36) and (37) together give us the desired bound on the 
stationary probability of the step size exceeding a given value, which 
we now compare with a similar bound that holds for y' = V7. The 
prime superscript is used on symbols to denote the functional depend- 
ence of the associated quantities on 7'. In establishing the reference, 
i.e., central, step size corresponding to 7', minor differences exist 
depending on whether [see eqs. (8) and (9)] 



w 



or 



(«) 



B(y*~ l ) > ^ B(y*~i) 



B(yi->) > ^ B(yi). 



(38) 



We consider only (ii), in which case : w'(n) = 2t <=> w(n) = i, and all 
the transition probabilities are simply related: ^ (r) (2z')' = \p lr) (i). As 
a consequence of the latter property, we have 



p'(2i) = P (i). 



(39) 



ADAPTIVE QUANTIZER 349 



Repeating the arguments leading to eqs. (36) and (37), we have 
Pr; [A ^ CV?***] ^ rf^ i max [p'(2*), • • •, p'(« + *)] (40) 

and 

max £p'(2i), ■■-, p'(2i + ,)] <, [^j J ^ ( 41 > 

By the fact that p'(2i) = p(i), we have 

«W*a^|*^[jJjP[jS,f. (42) 

Comparison with eqs. (36) and (37) completes the demonstration. 

III. TRANSIENT RESPONSE 

In this section, we are interested in the random time, called the 
adaptation time, taken for the step size of the quantizer to adapt from 
some arbitrary initial value to the central step size. It is necessary to 
have the adaptation time relatively small if the quantizer is to ade- 
quately track the scale variations of the input process. Also, it is 
reasonable to expect that, as y is made large, the increased range of 
the multipliers [eq. (7)] will give the desired tracking. However, as a 
counterbalance, we already know from the preceding section that, with 
the correct choice of the log of the multipliers, {m,-}, the quality of 
steady-state performance is increasingly impaired as the value of y is 
raised. From this brief discussion (see Ref. 1 for a more detailed 
discussion), it is clear that it is useful to have formulas for the efficient 
computation of the mean adaptation time and bounds that provide 
insight on the dependence of the time on the multipliers. 

3.1 Mean time for first passage to the central state 

We consider only the saturating adaptive quantizer since, as K 
and L are made large, the quantities obtained for this model approxi- 
mate corresponding quantities for the ordinary adaptive quantizer. 
Also, for the usual reason only the case of positive initial states, 
co(0) > 0, is considered. 

Let the initial step w(0) = i > and let T{i) denote the mean 
value of the random time r where «(r) ^ and o>(n) > for all 
n < t. It can be shown that, as a consequence of the recurrence and 
irreducibility of the Markov chain (see Appendix A), the mean first 
passage time, T(i), is finite with probability 1. If the first transition 
results in a transition to the state i + m r , the process continues as 
if the initial state had been i + m„ The conditional expectation of the 
first passage time is therefore T(i + m r ) + 1. From this argument, we 
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deduce that the following recursion is satisfied by the mean first 
passage time, 



T(i) = L b^(i){T(i + m r ) + 1} - wii + lgi£L-m N 



,(43) 



where, as in eq. (16), b^(i) = F^Cy*) - Fi^Cy*). Of course, 
Y.^=ib i - T), (i) = 1. The recursive relation in (43) may be used to 
generate the entire sequence {T(i)}, provided (m^ — mi) initial con- 
ditions can be found. Now, by the same argument that led to eq. 
(43), we have 

T{1 + ?m) = T(2 + mi) = = T(0) = 0. (44) 

The remaining m n initial conditions, namely, 

T(l), T{2), ■■.,T{m N ), 

are harder to obtain, and it is necessary to look more deeply into the 
dynamics of the process to obtain these quantities. 

For every time instant, we define the L-dimensional vector z (n) with 
components z(j; n), 1 ^ j ^ L, where 

z(j; n) = Pr [co(n) = j and a)(s) ^ 1 for all s ^ ii]. (45) 

We show in Appendix D that the vectors z(n) evolve in time according 
to the homogeneous equation 

z(n + 1) = Dz(n), n ^ 0, (46) 

where D is an L X L matrix. Also, in Appendix D we prove the 
following: For i ^ 1, 



T(i) = L xf>, 

where 

[I - D]x«> = e w 



(47) 



and the elements of the L- vector e (,) are zero everywhere except at 
the ith location where the element is unity. It is shown in Appendix D 
that [I — D] is nonsingular. 

The simple recursion in (43) may be used to generate the sequence 
{ T(i) \ after obtaining the nonzero initial conditions via m^ inversions, 
as in (47). Alternatively, if T(i) is required for only a few particular 
values of i, it may be easier to obtain them via the inversions in (47). 

The bulk of the equations in (47) [see eq. (72)] are in the form 
encountered in the analysis of the stationary distribution, eq. (21). 
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Also, the elements of the vectors zj° are all nonnegative. Hence, by 
applying the techniques and results of the preceding section, we may 
draw certain conclusions about eq. (47). 

First, the bandwidth of the matrix [I - D] may be reduced by 1 
by carrying out the reduction of the equations described in Section 
2.1. For mi = — 1 and arbitrary values of ra 2 , ■ • •, m^, this step is 
enough to triangularize the matrix [I — D] for any countable L and 
thus substantially simplify the computations. Second, we may con- 
clude from Section 2.2 that, with increasing j, the solution elements 
xj° decrease at least geometrically. This is a very useful property from 
the point of view of numerical inversion of [I — D] for L large and 
the approximation of the solution for L = <» by finite L. 

3.2 A bound on the mean first passage time 

Let T(i, j), ^ i < j, denote the following mean first passage 
time : the initial state w (0) = j, first crossing occurs after t transitions 
if w(r) ^ i, and o>(n) > i for all n < r, and T(i, j) = E(t). The 
quantity T(j) of the preceding section is equivalent in our present 
notation to T(0, j). We now give an explicit bound on T(i, j) that 
provides some insight into the dependence of T (i, j) on the multipliers. 

For both the ordinary and saturating adaptive quantizer, 



T{i, j) ^ c( - 1 +1) [(j - i) ~ (mi +1)] ^ i < j, 



where 



AT-l 

C(i) = £ (w r+ i - m r W r >(i) - m N 

r = l 



(48) 



From the definition of the central state, eq. (18), and the monotonicity 
of ^ (r) (i) with respect to i, we observe that for i > 0, C(i) is positive, 
monotonic, increasing with i. We only sketch the proof of (48) because 
the method of the proof is contained in the proof of the bound that 
we gave in Ref. 1 for the two-bit quantizer. First, recall [eq. (19)] 
that a supermartingale property exists that holds for both types 
of quantizers, according to which there is a net drift to the left 
from all states j > 0. Second, we define a new process in which 
o)'(n) = u(n) + nC(i + 1) and show that the supermartingale prop- 
erty, i.e., E[w'{n + 1) |w'(w)] ^ w'(n), is preserved for the range of n 
of interest. Finally, an application of Doob's theorem on optional 
stopping of supermartingales 10 on the new process yields the bound 
ineq. (48). 

The bound provides some insight into the dependence of the mean 
adaptation times on the multipliers, and y in particular, when the 
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initial and final step sizes are Cy j and C, respectively. Briefly, consider 
the effect of making y' = sly, i.e., M\ = VMf and the spread of the 
multipliers is reduced. The number of states between the states cor- 
responding to Cy' and C is doubled. Now C(l) is hardly affected by 
the transformation and, as a consequence of the linear dependence of 
the bound on T(i,j) on the distance (j — i), we have the bound on 
the mean adaptation time approximately doubled. For i = and 
j y> ( — mi), computations amply corroborate this conclusion. 

IV. COMPUTATIONAL RESULTS 

We present here a sampling of rather extensive computations done 
on three- and four-bit adaptive quantizers (N = 4 and 8, respectively) 
for independent identically distributed input sequences with gaussian 
distributions. Both uniform, i.e., £, = i, and nonuniform quantizers 
were considered. Max 8 has shown in the nonadaptive framework that 
optimal nonuniform quantizers can yield an improvement in the 
signal-to-noise ratio of about 20 percent over optimal uniform quan- 
tizers with the number of bits in the range of interest here. We note 
that four-bit adaptive quantizers have been breadboarded in Bell 
Laboratories, 3 and that Jayant's 2 systematic numerical study is re- 
stricted to uniform quantizers up to three bits. We also observe that a 
simple search procedure of the "optimal" set of multipliers grows to 
be almost unmanageable and expensive when the dimension of the 
parameter spaces is 8. 

Table I lists five quantizers with their respective parameters {w,}. 
The parameter y is not considered part of the characterization of the 
quantizer type. Among the quantizers investigated, the following five 
proved to be the most interesting in their respective classes, specified 
by number of bits and uniform or nonuniform. The first of the five, 
with y Sd 1.12, is close to what Jayant calls the optimal, three-bit 
quantizer. The parameters (?m,( were arrived at by the procedure 
described in remark (i), Section 1.2. 

Table I — Five quantizers 



Specifications 




Uniform or 
Nonuniform 


Number of 
Bits 


|log T (Mi)}: mi, •••,m N 


Designation 


Uniform 

Uniform 

Nonuniform 

Uniform 

Nonuniform 


3 
3 
3 
4 
4 


-1, -1, 2, 5 

-1,0,1,4 

-2, -1,2,8 

-2, -2, 0, 0, 2, 5, 10, 17 

-2, -2, 0, 0, 1, 2, 5, 16 


UQ, 3 bits, No. 1 
UQ, 3 bits, No. 2 
NUQ, 3 bits 
UQ, 4 bits 
NUQ, 4 bits 
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The optimum division of the horizontal axis in Fig. 1, given by 
£,, i = 1, 2, • ■ • , (N — 1), was obtained from Max, 8 and we reproduce 
these parameters for the reader's benefit. 

NUQ, 3 bits. {&) = {1.0,2.097,3.492}. 

NUQ, 4 bits. {$,} = {1.0, 2.023, 3.097, 4.256, 5.565, 7.142, 9.299}. 

Table II lists some statistics of the stationary step-size distribution 
for unit variance of the input distribution. The stationary distri- 
bution was obtained by solving the stationary equations of the satu- 
rating adaptive quantizers with suitably large saturating levels 
(K + Ltt 100). We also give the stationary step-occupancy prob- 
abilities qi, where g, = Pr. [£,_iA(n) ^ \x{n)\ < fcA(n)], as in eq. 
(26). Table II also gives, for purposes of comparison, corresponding 
quantities of the optimal nonadaptive quantizer obtained from Max. 8 
In particular, A is the optimal, nonadaptive step size. 

Figures 2 to 5 show the mean adaptation times for inputs with 
unit variance. Figures 2 and 3 are concerned with the three types of 
three-bit quantizers for various values of y. These figures plot the 
mean time taken by the quantizers to adapt to the central, and 
optimal, step size for various values of the initial step size. In Fig. 2, 
the initial step size exceeds the central step size, while the reverse case 
is considered in Fig. 3. Similarly, Figs. 4 and 5 plot data on the mean 
adaptation times for the uniform and nonuniform four-bit quantizers. 

The purpose of the remaining tables (III to V) is to give the reader a 
feel for the relative performance of the five quantizers. We measure per- 
formance by the ratio of the input signal energy to the quantization 



Table II- 


- Statistics of the stationary step-size distributions 


Type 


y 


A 
(Max) 


E(A) 


«r(A) 


Step Occupancy Probabilities 

(adaptive quantizer) 

(optimal nonadaptive quantizer) 


UQ, 3 bits 
No. 1 

UQ, 3 bits 
No. 2 

NUQ, 3 bits 
UQ, 4 bits 

NUQ, 4 bits 


1.04 
1.04 
1.04 
1.04 

1.04 


0.586 
0.586 
0.501 
0.335 

0.258 


0.594 
0.613 
0.522 
0.366 

0.279 


0.105 
0.089 
0.114 
0.095 

0.066 


{0.445, 0.310,0.156,0.089) 
{0.442, 0.317,0.162, 0.078) 

{0.458, 0.314,0.152, 0.075) 
{0.442, 0.317, 0.162, 0.078) 

{0.396,0.317,0.198,0.088) 
{0.383, 0.323, 0.213, 0.081 1 

{0.285, 0.244, 0.182, 0.121, 0.075, 

0.043,0.024,0.027) 
{0.263, 0.235, 0.188, 0.135, 0.086, 

0.049,0.025,0.019) 

{0.219, 0.205, 0.178, 0.145, 0.110, 

0.076,0.045,0.022) 
{0.204, 0.195, 0.177, 0.152, 0.121, 

0.086, 0.049, 0.016) 
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35 



30- 



UQ,NO.1;y = 1.06 



NUQ 
y=1.03 



10 



UQ,N0.2;y = 1.12 
^--"^NUQ,y = 1.06 



0.B 

/ 0.586 
0.5006 




UQ,N0.1;y=1.12 



UQ,N0.2;y=1.24 
UQ,N0.1;y=1.24 



1.5 2.0 

INITIAL STEP SIZE 



Fig. 2 — Transient response of three three-bit quantizers. 

error energy. Unlike all previous data, the data for these tables were 
obtained by Monte Carlo simulation. The interval of time over which 
performance was monitored is denoted by AM.. Thus, signal energy is 
L^ii x 2 (n). The remaining parameter in the tables is the initial step 
size, A (initial). However, we do not list the raw initial step size, but 

Table III* — S/N performance of two uniform three-bit quantizers 

(Main numbers are for UQ, three bits, No. 1; numbers in ( ) 

for UQ, three bits, No. 2) 



Log | A (initial)/ A j 



-1 


1 



#4 = 10 



6.92 (5.84) 
25.7 (27.6) 
0.549 (0.549) 



NA = 100 



14.4 (14.8) 
19.1 (21.4) 
3.94 (3.99) 



NA = 1000 



17.4 (19.3) 
17.9 (20.4) 
13.1 (14.3) 



NA = 10,000 



17.7 (20.1) 

17.8 (20.2) 
17.1 (19.2) 



* All logarithms in Tables III, IV, and V have base 10. 
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0.2 0.3 0.4 

INITIAL STEP SIZE 



0.5| 
0.5006 



Fig. 3 — Transient response of three three-bit quantizers. 
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30 



25 



20 



< 

E 15 



10 



,NUQ.y = 1.03 



NUQ,y=1.06 



UQ,y=1.06 




y-i.ia 



UQ,y*1.12 



1.5 2.0 

INITIAL STEP SIZE 



3.5 



0.2582 



Fig. 4 — Transient response of two four-bit quantizers. 

the more relevant quantity A(initial)/A where A is, as usual, the 
optimal nonadaptive step size. After experimenting, we arrived at the 
following values of y for the five quantizers, since they gave a suitable 
mix of performances over short (NA small) and long (NA large) runs. 

Table IV — S/N performance of nonuniform three-bit quantizer 

{NUQ, three bits) 



Log i A (initial) /&} 


NA = 10 


ATyl = 100 


NA = 1000 


NA = 10,000 


-1 

1 


5.81 
29.8 
1.12 


16.0 
23.8 
7.00 


21.2 
22.4 
18.2 


22.0 
22.1 
21.6 
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16 



1? 



t 4 



NUQ,y=1.03 



\ 



UQ, 7 = 1.03 





NUQ,y=1.06 



UQ.y=1.06 



0.2 • 

INITIAL STEP SIZE 02582 



0.3 



0.4 



Fig. 5 — Transient response of two four-bit quantizers. 

For a particular input process, the relative weightings may be quite 
different, and y may then be tuned accordingly. 



Quantizer 


7 


UQ, 3 bits, No. 1 


1.12 


UQ, 3 bits, No. 2 


1.12 


NUQ, 3 bits 


1.06 


UQ, 4 bits 


1.06 


NUQ, 4 bits 


1.06 



The following observations may be made on the above results. 
There is a pronounced asymmetry in performance with respect to 
log jA(initial)/A} over short runs (NA = 10 or 100). This is, of 
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Table V — S/N performance of uniform and nonuniform 

four-bit quantizers 

(Main numbers are for UQ, four bits; numbers in ( ) 

for NUQ, four bits) 



Log | A (initial) /A 



NA = 10 



NA = 100 



NA = 1000 



NA = 10,000 



-1 
Q 

1 



19.62 (21.65) 

86.2 (111.0) 

2.97 (4.86) 



36.98 (47.30) 
56.0 (80.1) 
17.7 (27.6) 



48.22 (67.35) 
50.60 (72.50) 
42.00 (62.00) 



48.97 (71.50) 
49.20 (71.90) 
48.10 (70.30) 



course, related to the contraction multipliers being grossly smaller 
than the expansion multipliers in all the quantizers considered (Table 
I). The s/n when A(initial)/A = 1 and NA = 10 is close to the s/n 
obtained with the step size optimally tuned to the known level of 
scaling of the input sequence. The steady but not excessive deteriora- 
tion in performance with increasing NA is the price paid for adapt- 
ability : it is due to the fluctuations in step size arising from the random 
walk. Finally, we observe from Table V that there is a striking gain 
from nonuniform quantization, the extent of the gain being somewhat 
greater than what may be expected from previous results on non- 
adaptive quantizers. 

APPENDIX A 

Existence and Uniqueness of the Stationary Distribution 

We establish in this appendix that, for independent identically 
distributed inputs, there exists a unique, finite stationary step-size 
distribution (invariant measure). The proof given here is via the 
construction of a stochastic Liapunov function, and it relies on a 
standard, unified theory of stochastic stability 11 ' 12 that is well-known. 
The stochastic stability of the adaptive quantizer has been proved by 
Goodman and Gersho, 4 and the prime reason for including an alterna- 
tive proof is our belief that familiarity with the method followed here 
may be beneficial to future workers in adaptive processes. The positive 
function that is proved to be a stochastic Liapunov function here is 
identical to the function that worked in Ref. 1 for the two-bit quan- 
tizer, and the proof is a straightforward generalization. 

We consider in turn two properties of well-behaved Markov chains, 
namely, irreducibility and recurrence. 

A.1 Irreducibility 

The Markov chain is irreducible if and only if every state com- 
municates with both neighboring states. This occurs if and only if 
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there exist nonnegative integers n* and n't, 1 ^ i £ N, such that 

ZiikfK = 1 (49) 

and 

2m<n|=-l. (50) 

It is an elementary fact from Euclid's theory that this occurs if and 
only if the integers (m,) are relatively prime, i.e., their greatest 
common divisor is unity. 

A.2 Recurrence 

Consider the following nonnegative function of the states 

V{i) ^ \i\ i = 0, ±1, ±2, .... (51) 

Let D (i) be defined as 

D(i) = ElV{a>(n + 1)) |«(n) = *] - V(i). (52) 

Now D(i) is uniformly bounded from above. By the monotonicity 
of \J/ (r) (i) with respect to i and the definition of the central state, (18), 
we obtain, for alH ^ (-mi), 

D(i) = m. N - t, (™r + i - m r )^ r) (i) 

r-l 

^ m N - £* (m r+ i - mrWH-m) < (53) 
and, for all i ^ — m^, 
D(t) = -m* + £ X (m r+1 - m T )^{i) 

r-l 

^ -m w + L (m r+ i - m r )^ (r) (-?n^) < 0, (54) 

r=l 

where, as in eq. (23), ^ T) (j) denotes Fi^Cy^. Hence, by virtue of 
eqs. (53) and (54), D(i) ^ — e < for all but a finite set of states i, 
and V(i) is a stochastic Liapunov function for the process. 

From Kushner's Theorem 7, 11 we have recurrence and we can infer 
further, from Theorem 4, that there exists at least one finite invariant 
measure, i.e., stationary distribution. Also, since we have shown 
earlier that two or more disjoint self-contained subsets of the state 
space do not exist, we have, from Theorem 5, at most one invariant 
probability measure. The existence and uniqueness of a finite stationary 
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distribution for the step size of the ordinary adaptive quantizer is 
therefore established. 



A.3 The saturating adaptive quantizer 

The argument leading to irreducibility is intact. In addition, we 
have here that the end states (-K) and L have period 1 and, since 
periodicity is a class concept (i.e., every state in a particular com- 
municating class has the same periodicity), the entire Markov chain 
is aperiodic and, consequently, there is a single ergodic class that 
includes every state in the chain. Hence, the distribution at time 
n, p(n) approaches p, the stationary distribution for all initial dis- 
tributions, and furthermore every component probability of p is 
strictly positive. 

APPENDIX B 

The Saturating Adaptive Quantizer 

We give in this appendix a set of equations satisfied by the stationary 
probabilities of the states in the saturating adaptive quantizer. These 
equations are complete and reduced by the method described in 
Section 2.1. 

Let /i denote the number of contraction multipliers, i.e., multipliers 
having values less than 1, so that 

■mi, • • •, m„ < < m M+ i, • • •, ra#. (55) 

The tacit assumption that there are no multipliers exactly equal to 
unity is by no means necessary, but does lead to a simpler presentation. 
The main set of equations is 

1 JV-l j-m T 

E vd) = E E * (r) (*)p(0, 

i=j— mN+l r=l i=j— m r +l+l 

-K + m N - 1 ^ j ^ L + wii. (56) 
The lower boundary equations are* 

i— 1 »— 1 j— m r — 1 

E P(i) = E E * (r) (i)p(0, (57) 

i=-K r=l i KA(j-m r +l) 

where y. + 1 ^ s ^ N and — if + m s -\ + 1 ^ j ^ —if + m,. Finally, 



* a: a y = Max [x, j/J and xvy = Min [x, ?/ J- 
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the upper boundary equations are 

i N-i lv y-%) 

E P(0 - E £ t w (i)p(i), (58) 

where 1 ^ s ^ /i and L + w„ ^ j ^ L + m, + i — 1. 



APPENDIX C 

Proofs of Lemmas 1 and 2 

C.1 Proof of Lemma 1 

(i) It can be shown that the determinant of the matrix A,-, 

det [A,-] = (-1)«"»Q - rf,^-^\i)-]/yf,^{i + m N - m x - 1). 

As det [A,] > 0, A," 1 exists. 

Since P, = A < ~ 1 P*fi J we observe from the structures of Pi and P,+i 

that the matrix A* -1 is in companion form in that all rows except the 
first reflect shift operations, i.e., for k ^ 2, 

[AT 1 ]*.! = if I * (k - 1) 

= 1 if 1= (k-1). (59) 

The elements of the first row of A f -1 are obtained from the equation 

mjtf— 1 AT— 1 mjf— tn T — 1 

E p(» + I) - E E * (r) (» + DpH + = 0. (24) 

1=0 r=l l=mN— m T +l 

As the matrix A t -1 is in companion form, we know that its charac- 
teristic polynomial is equal to within a constant of proportionality to 
the polynomial obtained by replacing, in eq. (24), p(i + I) by 
^mn-mi-i-1 xhat is, where 

C(n) = (-l)^-^" 1 det [AT 1 - Ml], 
we have 

mtf—l 

[i _ ^(^-i>(»)]C(m) = E M" w - mi_1 -' 

2=0 

W— 1 mjf— m r — 1 

- E E ^ (r) (*' + i)n mN ~ mi - 1 - 1 . (60) 

r=l l=mN— n» r +l 

The quantity [1 — ^ (Ar-1) (i)] is merely the coefficient of p(i) in 
eq. (24). 

Scanning the coefficients of the polynomial C(n), we observe that 
there is a single-sign alternation and, hence, by Descartes' rule, C(n) 
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has at most one real positive root. Since 

C(0) = -*<«{* + m N - rm - 1)/[1 - ^"-"(t)] < 

and C(ju) — »<» as m— >0 °, there exists exactly one real positive root. 
Let r denote this root. 
Now 

N—l mif—m r —l 

[1 _ fW-i>(t)]C(l) = mjv - L £ ^w(i + /) 

r=l l=mN— m r +i 
N—l m;i/— m r — 1 

< Wat - E L * (r) 'W 

r=l l=mN— m T +l 

= tmrW r) (i) -* (r - 1) (*')}, (61) 

r=l 

where we have followed the usual convention in setting \[/ {N) (i) = 1 
and ^< 0) (?) = 0. So C(l) < if ££., m r {^< r >(i) - rp {T - l) (i)\ ^ 0. The 
latter condition holds for all i ^ [see eqs. (17) and (18)]. Hence, 
r > 1. 

(m) Let us denote the elements of the first row of Af 1 by {«*} and 
{/3/} so that the row appears as 

[-ai - a, a»w-i0i02- • -0-mJ. (62) 

One reason for expressing the row in this manner is that every a* and 
/3i is strictly positive by eq. (24). 

The left eigenvector X of A, -1 corresponding to the eigenvalue r 
satisfies, by definition, ^'Af 1 = rV. Examining the component equa- 
tions, we find that 

Xz +1 = (r l + air'" 1 + • • • + ai)X! l^l£(m N - 1). (63) 

Also, for 1 ^ Z ^ (-wi), 

Wh^-i = 7r=3=T D3— ,-i+ir*- 1 + /3_ mi _ I+2 r 1 - 2 + • • • + j3- m J. (64) 
Finally, 

Xm^-mi-l = ~p Xl- (65) 

Since the a's and /3's are positive quantities, statement (n) of the 
lemma is true. 

(m) The statement may be verified by examining the characteristic 
polynomial C(n) in eq. (60) and observing that the quantities ^ (r) (i) 
are monotonic, increasing with i. 
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C.2 Proof of Lemma 2 

It is required to prove that, for j ^ i > 0, 

3L.C4T 1 - Af^Py+i ^ 0. (66) 

The matrices Af 1 and A, -1 are identical in all except the first row and 
also X 1 > 0. Equation (66) is therefore equivalent to* 

•Uf *P/w ^ eiAr x Py + i. (67) 

We prefer to show that 

0(at-d {i)pU) ^ |OH) (^Ar'Py+x, (68) 

where 0W-»(*) = {1 - * (J,r - l) (*)) > 0. As eiA^Py+i = p(j), the 

lemma will then have been proved. 
From eq. (24), 

d<»-»(j)p(j) = P (j) - V-»U)pU) 

j+mjf—l JV— 1 y+n»jir— wii— 1 

= - E p(0+ E E f w ©p(K) 

Z=y+1 r=l i=>fmiV-mr+l 

-*<*•*> (j)p(i) (69) 
and 

y+mAT-l 

fl^-"(i)e l 1 AT 1 Py+i=- E p(Z) 

i-y+i 

JV — 1 j+mjf — mi — 1 

+ E E * (r) (J - j + 0p(0 - * <JM) (0p(i). (70) 

r=l J=y+m^— m r +i 

Now 

ew-»(i) P (j) - ^^"(^eiAr^y+i 

n — i y+>»w— nif— i 

^ E E {* <r) (0 -V r) (i- j + i)}p(D 

r=l l=j+mN— m r +l 

- W^U) - t^WMJ) 2= 0, (71) 

because of the monotonicity of \f/ {r) (I), and the final term in the expres- 
sion on the right-hand side of (71) is cancelled by an identical com- 
ponent (r = N — 1, I = j + m N — m r+ i) of the leading part. The 
lemma is proved. 



* The column vector with the leading element equal to unity and all other elements 
equal to zero is denoted by ej, 

364 THE BELL SYSTEM TECHNICAL JOURNAL, FEBRUARY 1975 



APPENDIX D 

Two Equations Concerning Mean First-Passage Times 

We prove two assertions made in Section 3.1, eqs. (46) and (47), 
concerning (i) the homogeneous evolution of the vectors {z(?i)j via 
the matrix D and (ii) the explicit formula for the mean first-passage 
time, T(i). 

D.1 Derivation of eq. (46) 

Let X(n) denote the event 1 ^ w(t) ^ L for all t, ^ t ^ n. 
Then, by definition, 

z(j; n) = Pr [«(n) = j and X„] 1 ^ j ^ L. 

Since it is also true that 

2 (i; n) = Pr [«(») = j and I„_i], 
we have 

L 

«(i; «) = I Pr [w(n) = j\u(n - 1) = i, X„-{]z{i\ n - 1). 

We have obtained the quantities Pr [o>(n) = j\u(n — 1) = i, -X"„_i] 
for 1 ^ i, i ^ L and, thereby, the following equations. In the follow- 
ing, n denotes the number of contraction multipliers, that is, 

mi, mi, • • •, w M < < w M+ i, ■ • •, tyin. 

The basic recursion is, for m^ + 1 ^ j ^ L + mi, 

z(j; n) - £ & (r) (j - m r )z(j - m T ; n - 1). (72) 

r=l 

The initial boundary equations are 

«0'; n) = £ &<«■>( j - m r )«(j - m P ; n - 1) Uj| m M+i (73) 

r— 1 

= E & (r) 0' - m r )z{j - m r ; n - 1) m, + 1 ^ j ^ m.+i 

s = M + l,M + 2, ••-, (tf-1). (74) 
The final boundary equations are 

AT 

2 0';rc) = L 6 (r) (i - m r )z(j - m r ;n - 1) 

L + m._! + 1 ^ j g L + m„ s = 2, 3, ■ • •, u, (75) 
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= E & (r) (j — m r )z{j - m r ; n — 1) 

r= M +l 



L + m„ + 1 ^ ; ^ L - 1, 



= £ E &<*>(*>(*;» -1) i = L. 

r=/i+I i=L—m T 



(76) 
(77) 



Equations (72) to (77) define the matrix D stated in the main text. 

D.2 Derivation of eq. (47) 

Fori = 1, 2, •••, L, let 

f(i; n + 1) A P f [first passage occurs at (n + 1) |«(0) = i] 
= Pr [>(tt + 1) ^ 0, X„|co(0) = i~\ 

— mi 

= L Pr [«(n + 1) ^ 0|o>(n) = i>(j;n), (78) 

with z(0) = e (0 , the vector with every element equal to zero except 
for the ith element, which is unity. The event o(n + l) = fc^0 
conditioned on w(n) = j is associated with a jump = k — j. The 
following diagram illustrates the magnitudes of the jumps required for 
passage. 

jumps 



•0 



1 2 m L 



( — w^ + l) m„_i 



(-m 2 + l) ffli I 



jump ^ m p jump ^ ra„_i 

Equation (78) can be explicitly stated, thus, 



jump ^ mi 



— nip — fHn—i 

Id; n + 1) = £ *«0>(i; n) + E ^"-"(jXi; '0 

+ ■■■ + ~£ + w U)z(r,n). (79) 

/=— m2+i 

In the more convenient vector form, 

/(i,*n + l) = c'z(n), (80) 

where the coefficients of the L-dimensional column vector c is ob- 
tained from (79), and we observe that only the leading (— mi) ele- 
ments of c are nonzero. 
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The important fact about the vector c is that 

c< = l'p - D], (81) 

where 1 is the vector with every element equal to unity. Equation 
(81) may be established by either direct verification or by probabilistic 
reasoning. Now 

T(i) = £ (n + l)/(i;n + l), 

-c« £ nz(n) + £ f(i;n+ 1), 

= c' £ nz(n) + 1, (82) 

= l'[I - D] £ nz(n) + 1 from (81), 

- 1* £ *(»), (83) 

= 1'[E o D»]z(0), (84) 

= I'D - D]-'z(0). (85) 

Equation (82) is obtained by noting that the probability that passage 
occurs at finite time is unity. In obtaining Eq. (83), we have used 
z(n + 1) = Dz(n) and that l'z(O) = 1. The convergence of the series 
2D" is a consequence of the fact that every eigenvalue of the matrix 
D is strictly inside the unit circle. We omit the proof of this assertion, 
as it is similar to the proof given in Ref. 1 in connection with the 
matrix D for two-bit quantizers. 

Equation (85) with z(0) = e (i) is the same as eq. (47) in the main 
text. 
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